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Abstract — Manifold learning is a hot research topic in the field of computer science and has many applications in the real world. A 
main drawback of manifold learning methods is, however, that there is no explicit mappings from the input data manifold to the output 
embedding. This prohibits the application of manifold learning methods in many practical problems such as classification and target 
detection. Previously in order to provide explicit mappings for manifold learning methods, many methods have been proposed to get 
an approximate explicit representation mapping with the assumption that there exists a linear projection between the high-dimensional 
data samples and their low-dimensional embedding. However, this linearity assumption may be too restrictive. In this paper, an explicit 
nonlinear mapping is proposed for manifold learning, based on the assumption that there exists a polynomial mapping between the high- 
dimensional data samples and their low-dimensional representations. As far as we know, this is the first time that an explicit nonlinear 
mapping for manifold learning is given. In particular, we apply this to the method of Locally Linear Embedding (LLE) and derive an 
explicit nonlinear manifold learning algorithm, named Neighborhood Preserving Polynomial Embedding (NPPE). Experimental results 
on both synthetic and real-world data show that the proposed mapping is much more effective in preserving the local neighborhood 
information and the nonlinear geometry of the high-dimensional data samples than previous work. 

Index Terms — Manifold learning, nonlinear dimensionality reduction, machine learning, data mining. 



1 Introduction 

MANIFOLD learning has drawn great interests since 
it was first proposed in 2000 ( H], (H, H) as 
a promising nonlinear dimensionality reduction (NDR) 
method for high-dimensional data manifolds. Its basic 
assumption is that high-dimensional input data samples 
lie on or close to a low-dimensional smooth manifold 
embedded in the ambient Euclidean space. For example, 
by rotating the camera around the same object with 
fixed radius, images of the object can be viewed as a 
one-dimensional curve embedded in a high-dimensional 
Euclidean space, whose dimension equals to the num- 
ber of pixels in the image. With the manifold assump- 
tion, manifold learning methods aim to extract the in- 
trinsic degrees of freedom underlying the input high- 
dimensional data samples, by preserving local or global 
geometric characteristics of the manifold from which 
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data samples are drawn. In recent years, various man- 
ifold learning algorithms have been proposed, such as 
locally linear embedding (LLE) Q, El, ISOMAP H), 0, 
Laplacian eigenmap (LE) [13, diffusion maps (DM) |14| , 
local tangent space alignment (LTSA) [11 J, and Rieman- 
nian manifold learning 1 13|. They have achieved great 
success in finding meaningful low-dimensional embed- 
dings for high-dimensional data manifolds. Meanwhile, 
manifold learning also has many important applications 
in real-world problems, such as human motion detec- 
tion IITTI , human face recognition [18J, classification and 
compressed expression of hyper-spectral imageries fT9ll , 
dynamic shape and appearance classification (20l, and 
visual tracking [21j-|23|. 

However, a main drawback of the manifold learning 
methods is that they learn the low-dimensional repre- 
sentations of the high-dimensional input data samples 
implicitly. No explicit mapping relationship from the 
input data manifold to the output embedding can be 
obtained after the training process. Therefore, in order 
to obtain the low-dimensional representations of the new 
coming samples, the learning procedure, containing all 
previous samples and new samples as inputs, has to 
be repeatedly implemented. It is obvious that such a 
strategy is extremely time-consuming for sequentially 
arrived data, which greatly limits the application of the 
manifold learning methods to many practical problems, 
such as classification, target detection, visual tracking 
and detection. 

In order to address the issue of lacking explicit map- 
pings, many linear projection based methods have been 
proposed for manifold learning by assuming that there 
exists a linear projection between the high-dimensional 
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input data samples and their low-dimensional rep- 
resentations, such as Locality Preserving Projections 
(LPP) [241 , |25l . Neighborhood Preserving Embedding 
(NPE) 1261, Neighborhood Preserving Projections (NPP) 
||27| , Orthogonal Locality Preserving Projections (OLPP) 
Il28l , Orthogonal Neighborhood Preserving Projections 
(ONPP) Eg, II3D1, and Graph Embedding fSl. Although 
these methods have achieved their success in many prob- 
lems, the linearity assumption may still be too restrictive. 

On the other hand, several kernel-based methods have 
also been proposed to give nonlinear but implicit map- 
pings for manifold learning (see, e.g. I32l - ll35l ). These 
methods reformulate the manifold learning methods as 
kernel learning problems and then utilize the existing 
kernel extrapolation techniques to find the location of 
new data samples in the low-dimensional space. The 
mappings provided by the kernel-based methods are 
nonlinear and implicit. Furthermore, the performance of 
these methods depends on the choice of the kernel func- 
tions, and their computational complexity is extremely 
high for very large data sets. 

In this paper, an explicit nonlinear mapping for mani- 
fold learning is proposed for the first time, based on the 
assumption that there exists a polynomial mapping from 
the high-dimensional input data samples to their low- 
dimensional representations. The proposed mapping has 
the following main features. 

1) The mapping is explicit, so it is straightforward 
to locate any new data samples in the low- 
dimensional space. This is different from the tradi- 
tional manifold learning methods such as like LLE, 
LE, and ISOMAP El in which the mappmg is im- 
plicit and it is not clear how new data samples can 
be embedded in the low-dimensional space. Com- 
pared with kernel-based mappings, the proposed 
mapping does not depend on the specific kernels 
in finding the low-dimensional representations of 
new data samples. 

2) The mapping is nonlinear. In contrast to the linear 
projection-based methods which find a linear pro- 
jection mapping from the input high-dimensional 
samples to their low-dimensional representations, 
the proposed mapping provides a nonlinear poly- 
nomial mapping from the input space to the re- 
duced space. Clearly, it is more reasonable to use 
a polynomial mapping to handle with data sam- 
ples lying on nonlinear manifolds. Meanwhile, our 
analysis and experiments show that the proposed 
mapping is of similar computational complexity 
with the linear projection-based methods. 

Combining this explicit nonlinear mapping with ex- 
isting manifold learning methods (e.g. LLE, LE, Isomap) 
can give explicit manifold learning algorithms. In this 
paper, we concentrate on the LLE manifold learn- 
ing method and propose an explicit nonlinear mani- 
fold learning algorithm called Neighborhood Preserving 
Polynomial Embedding (NPPE) algorithm. Experiments 
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on both synthetic and real-world data have been con- 
ducted to illustrate the validity and effectiveness of the 
proposed mapping. 

The remaining part of the paper is organized as 
follows. Section |2] gives a brief review of the existing 
manifold learning methods including those based on 
linear projections and kernel-based nonlinear mappings. 
Details of the explicit nonlinear mapping for manifold 
learning are presented in Section |3l whilst the NPPE 
algorithm is given in Section ID In Section |5l experiments 
are conducted on both synthetic and real-world data sets 
to demonstrate the validity of the proposed algorithm. 
Conclusion is given in Section |6l 

2 Related Works 

In this section, we briefly review existing manifold learn- 
ing algorithms including those based on linear projec- 
tions and out-of-sample nonlinear extensions for learned 
manifolds. 

For convenience of presentation, the main notations 
used in this paper are summarized in Table [T] Through- 
out this paper, all data samples are in the form of column 
vectors. Matrices are expressed using normal capital 
letters and data vectors are represented using lowercase 
letters. The superscript of a data vector is the index of 
its component. 

2.1 Manifold Learning lUlethods 

According to the geometric characteristics which are 
preserved, existing manifold learning methods can be 
cast into two categories: local or global approaches. 

As local approaches. Locally Linear Embedding (LLE) 
121, |3l preserves local reconstruction weights. Locally 
Multidimensional Scaling (LMDS) HI preserves local 
pairwise Euclidean distances among data samples. Max- 
imum Variance Unfolding (MVU) flOl also preserves 
pairwise Euclidean distances in each local neighborhood, 
but it maximizes the variance of the low-dimensional 
representations at the same time. Local Tangent Space 



n-dimensional Euclidean space where input samples lie 
m-dimensional Euclidean space, m < n, where the 
low-dimensional embedding lie 
Xi = {xj, ■ ■ ■ the i-th input sample in M", 

i = 1,2,..., AT 
X X = {xi, a;2, . . . , xjv}, the set of input samples 
X X = \x\ X2 ■ ■ ■ xtsi\, n X N matrix of input samples 
Vi Vi = (s/i I ■ ' ■ ^vT)'^ I low-dimensional representation 
of Xi obtained by manifold learning, i = 1,2, . . . , N 
y y = {yi,y2, ■ ■ ■ ,yN}, the set of low-dimensional 
representations 

Y y = [yi y2 ■ ■ ■ yN], m x N matrix of low-dimensional 

representations 
Im Identity matrix of size m 

L2-norm where \\v\\2 = ^J'lZk—ii^'^)^ ^or an 
m-dimensional vector v 
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Alignment (LTSA) IITTl keeps the local tangent struc- 
ture. Diffusion Maps fM] preserves local pairwise dif- 
fusion distances from high-dimensional data to the low- 
dimensional representations. Laplacian Eigenmap (LE) 
IIT2I preserves the local adjacency relationship. 

As global approaches. Isometric Feature Mapping 
(ISOMAP) m, 15) preserves the pairwise geodesic dis- 
tances among the high-dimensional data samples and 
their low-dimensional representations. Hessian Eigen- 
maps (HLLE) [15j extends ISOMAP to more general 
cases where the set of intrinsic degrees of freedom may 
be non-convex. In Riemannian Manifold Learning (RML) 
IIT3I , the coordinates of data samples in the tangential 
space are preserved to be their low-dimensional repre- 
sentations. 

2.2 Linear Projections for IVIanifold Learning 

Manifold learning algorithms based on linear projections 
assume that there exists a linear projection which maps 
the high-dimensional samples into a low-dimensional 
space, that is, 

= U^x.„ where U e M"^™, (1) 

where Xi is a high-dimensional sample and Ui is its 
low-dimensional representation. Denote by Ui the i-th 
colvimn of U. Then from a geometric point of view, data 
samples in M" are projected into an m-dimensional linear 
subspace spanned by The low-dimensional rep- 

resentation Ui is the coordinate of Xi in M™ with respect 
to the basis 

2.2. 1 LPP 

Locality Preserving Projections (LPP) Il24l , Il25l provides 
a linear mapping for Laplacian Eigenmaps (LE), by 
applying ((l) into the training procedure of LE. The LE 
method aims to train a set of low-dimensional rep- 
resentations y which can best preserve the adjacency 
relationship among high-dimensional inputs X. If Xi and 
Xj are "close" to each other, then yi and yj should also 
be so. This property is achieved by solving the following 
constrained optimization problem 

mill J2 ■ W,,\\y,-y,\\t (2) 

EN 
. , Diy,y^ = Im , (3) 

where the penalty weights Wij are given by the heat 
kernel W^j = cxp(-||a;^ - Xj\\l/t) and A = J2j}^ij- 

In LPP, equation is applied to (|2j and that is, 
each Xi is replaced with U^yi. By a straightforward alge- 
braic calculation, equations ^ and l|3j are transformed 
into 

mill Tt{U^XLX^U) (4) 
s. t. U'^XDX^U = Im , (5) 



where W = {Wij), L = D — W and D is the diagonal 
matrix whose (i,i)-th entry is Di. This optimization 
problem leads to a generalized eigenvalue problem 

XLX'^u, = X.XDX'^u^ , 

and the optimal solutions ui,u2, ■ ■ ■ ,u„i are the eigen- 
vectors corresponding to the m smallest eigenvalues. 

Once {ui}"^i are computed, the linear projection ma- 
trix provided by LPP is given hy U = [ui U2 ■ ■ ■ Um]- 
For any new data sample x from the high-dimensional 
space M", LPP finds its low-dimensional representation 
y simply hy y = U'^x. 

2.2.2 NPPandNPE 

The Hnear projection mapping for Locally Linear Embed- 
ding (LLE) is independently provided by Neighborhood 
Preserving Embedding (NPE) Il26l and Neighborhood 
Preserving Projections (NPP) 113 . Similarly to LPP, NPE 
and NPP apply the linear projection assumption ^ 
to the training process of LLE and reformulate the 
optimization problem in LLE as to compute the linear 
projection matrix. 

During the training procedure of LLE, a set of linear 
reconstruction weights {Wij}'^j^i are first computed by 
solving a convex optimization problem 

N N 

min ^\\x^-J2^v^j\\2 
i=i j=i 

s. t. W.,j = 0, if j i N{i) 

N 

where N(i) is the index set of the k nearest neighbors of 
Xi. Then LLE aims to preserve {Wij}f^=i from X to y. 
This is achieved by solving the following optimization 
problem 

N N 

min El|y.-5]l^.,y,|l^ (6) 

i=i j=i 

1 " 

S- t. — E VrVf = Im (7) 
1=1 

In NPE and NPP, the linear projection assumption ^ 
is used in the above optimization problem, so ^ and (O 
become 

min Tt{U'^XMX'^U) (8) 
s. t. U'^XX'^U = I,n (9) 

where M = {In - W)^{In - W) with W = {W^j). 
The optimal solutions ui, 1x2, • • • , Um are the eigenvectors 
of the following generalized eigenvalue problem corre- 
sponding to the m smallest eigenvalues 

XMX'^Ui = XiXX'^u, . 
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After finding the linear projection matrix U = 
[ui U2 ■ ■ ■ Um], any new data sample x from the high- 
dimensional space M" can be easily mapped into the 
lower dimensional space M™ hy y = U^x. 

2.2.3 OLPP and ONPP 

Orthogonal Locality Preserving Projections (OLPP) ||28l 
and Orthogonal Neighborhood Preserving Projections 
(ONPP) USD, EOl are the same as LPP and NPE (or NPP), 
respectively, except that the linear projection matrix pro- 
vided by LPP and NPE (or NPP) is restricted to be 
orthogonal. This is achieved by replacing the constraints 
(|5]l and (|9) with U'^U = Im- Then the optimization 
problems in OLPP and ONPP become 

OLPP: UoLPP = argmin Tt{U'^XLX^U) (10) 
ONPP: UoNPP = argmin Tt:{U'^ XMX'^U) . (11) 

Unlike in the cases of LPP and NPE (or NPP), these 
two optimization problems lead to eigenvalue problems 
which are much easier to solve numerically than a 
generalized eigenvalue problem. The column vectors of 
UoLPP are given by the eigenvectors of XLX^ corre- 
sponding to the m smallest eigenvalues. The same result 
holds for UoNPP by replacing XLX'^ with XMX'^. The 
reader is referred to |i28J and |29), HSO) for details of these 
two algorithms. 

2.3 Out-of-Sample Nonlinear Extensions for IVIani- 
fold Learning 

Besides linear projections for manifold learning, several 
out-of-sample nonlinear extensions are also proposed 
for manifold learning in order to get low-dimensional 
representations of unseen data samples from the learned 
manifold. These methods are based on kernel functions 
and extrapolation techniques. A common strategy taken 
by these methods is to reformulate manifold learning 
methods as kernel learning problems. Then extrapolation 
techniques are employed to find the location of new 
coming samples in the low-dimensional space from the 
learned manifold. Bengio et al. Il32l , [36] proposed a uni- 
fied framework for extending LLE, ISOMAP and LE, in 
which these methods are seen as learning eigenfunctions 
of operators defined from data-dependent kernels. The 
data-dependent kernels are implicitly defined by LLE, 
ISOMAP LE and are used together with the Nystrom 
formula l38l to extrapolate the embedding of a manifold 
learned from finite training samples to new coming sam- 
ples for LLE, ISOMAP and LE (see 1321, ||36|). Chin and 
Suter [35] investigated the equivalence between MVU 
and Kernel Principal Component Analysis (KPCA) [39], 
by which extending MVU to new samples is reduced 
to extending a kernel matrix. In their work 1351 , the 
kernel matrix is generated from an unknown kernel 
eigenfunction which is approximated using Gaussian 
basis fxmctions. A framework was proposed in 



for efficient kernel extrapolation which is based on a 
matrix approximation theorem and an extension of the 
representer theorem. Under this framework, LLE was 
reformulated and the issue of extending LLE to new data 
samples was addressed in Il33l . 

3 Explicit Nonlinear Mappings for 
Manifold Learning 

In this section, we propose an explicit nonlinear map- 
ping for manifold learning, based on the assump- 
tion that there is a polynomial mapping between the 
high-dimensional data samples and their lower dimen- 
sional representations. Precisely, given input samples 
xi,X2, ■ ■ ■ ,xn and their low dimensional representations 
2/1 , j/2 7 • • • , UN, we assume that there exists a polynomial 
mapping which maps X to y, that is, the fc-th component 
yf of yi is a polynomial of degree p with respect to Xi in 
the following maimer: 



E 

Ii,/-2,.-.,2ti>0 
l<ll+l2 + --- + ln<P 



vi{xiy^ix^y^---{x7y- , (12) 



n\ljj 



where /i,Z2,...,Zn are all integers. The superscript 1 
stands for the n-tuple indexing array {li,l2, ■ ■ ■ Jn) and 
Vk is the vector of polynomial coefficients which is 
defined by 

/ '>^k\h=P,l2=0,---lr^=Q \ 
Vi\h=p-l,l2 = l,---l„=0 



Vk 



"fckl=l,i2=0.---i„ 



V '"k\h=0,h 



(13) 



By assuming the polynomial mapping relationship, we 
aim to find a polynomial approximation to the unknown 
mapping from the high-dimensional data samples into 
their low-dimensional embedding space. Compared with 
the linear projection assumption used previously, a poly- 
nomial mapping provides high-order approximation to 
the unknown nonlinear mapping and therefore is more 
accurate for data samples lying on nonlinear manifolds. 

In order to apply this explicit nonlinear mapping to 
manifold learning algorithms, we need two definitions 
from matrix analysis [HOj. 

Definition 3.1: The Kronecker product of an m x n 
matrix A and a p x q matrix B is defined as 



A®B = 



ai„B 



^m ri B 



which is an mp x nq matrix. 

Definition 3.2: The Hadamard product of two m x n 
matrices A and B is defined as 

^ flll^ll ■ • ■ 0,i„bin 

AqB= : : 
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Recently, it was proved in |3T1 that most manifold 
learning methods, including LLE, LE, and ISOMAP, can 
be cast into the framework of spectral embedding. Under 
this framework, finding the low-dimensional embedding 
representations of the high-dimensional data samples is 
reduced to solving the following optimization problem 



mm 

2 



1 ^ 



N 



s.t. ^D^myJ = I„ 



(14) 



(15) 



where Wij, i,j = 1,2, . . . , N, are positive weights which 
can be defined by using the input data samples and Di = 

Applying the polynomial assumption |(T2|| to the above 
general model of manifold learning gives a general 
manifold learning algorithm with an explicit nonlinear 
mapping. Denote {x\y^{xj)'''^ ■ ■ ■ {xfY" by x\ and substi- 
tute l fT2)l into l(l4l l. Then the objective function becomes 



1 ^1 ~ 



1 ^1 



»j fc \ \ 1 / 

-(e^-;) (e-h)) 
-e(e 

-e(e (E"i-;)^.(E 



(16) 



Substitute into (|T5l l, so the constraint is transformed 
into 



1^1 



Ea 




This is equivalent to 



(17) 



where (5^^ = 1 for j = fc and = otherwise. 

In order to simplify ((T6] l and (17)1 , we define X^^ by 



(18) 



Then Y.\ ^'J^p*'' so (HD and (IZll are reduced, 

respectively, to 



min Y.^lWx^^D,{Xf) 

k { i 

-^X('%,^.(X(-'-)) 

S.t. .j|^xWi,,(X«)^| 



(19) 
(20) 



By writing Xp ^ [X^^^ X^^' 
be further simplified to 



X^'^\ m and ll20ll can 



min vlXpWXp 



Vk 



s. t. vfXpDXpVk = 6jk , 



(21) 
(22) 



where W = {Wij) and D is a diagonal matrix whose i-th 
diagonal entry is Di. 

By the Rayleigh-Ritz Theorem [40J, the optimal solu- 
tions Vk, fc = l,2,...,m, are the eigenvectors of the fol- 
lowing generalized eigenvalue problem corresponding 
to the m smallest eigenvalues 

Xp{D~W)X^v, ^ XXpDX^v^, vfXpDX^v, = % (23) 

Once Vk, k = 1,2, ...,m, are computed, the explicit 
nonlinear mapping from the high-dimensional data sam- 
ples to the low-dimensional embedding space R"' can be 
given as 



fEAix'Y'i 



{x^y- 



(24) 



where a:; is a high-dimensional data sample and y is its 
low-dimensional representation. For a new coming sam- 
ple Xnew/ its location in the low-dimensional embedding 
manifold can be simply obtained by 

2/„e» = {vfxjr'-^ , vjxj,"^-^ , • • . , vlX^-^^Y , (25) 

where Xp"^™-* is defined in the same way as in flSt . 

In the next section, we will make use of a similar 
method as in LLE to define the weights Wij, i,j ~ 
1,2, . . . , N, so that the geometry of the neighborhood of 
each data point can be captured. 
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4 Neighborhood Preserving Polyno- 
mial Embedding 

In this section, we propose a new manifold learning algo- 
rithm with an explicit nonlinear mapping, named Neigh- 
borhood Preserving Polynomial Embedding (NPPE), 
which is obtained by defining the weights Wij, i,j = 
1,2,...,N, in a way similar to the LLE method and 
combining them with the explicit nonlinear mapping as 
in the preceding Section |31 

4.1 NPPE 

Consider a data set {xi, X2, ■ ■ ■ , x^} from the high- 
dimensional space R". NPPE starts with finding a set of 
linear reconstruction weights which can best reconstruct 
each data point Xi by its fc-nearest neighbors (k-NNs). 
This step is identical with that of LLE ||2J, 131- The 
weights Rij, i,j — 1,2, . . . , N, which are defined to be 
nonzero only if xj is among the fc-NNs of Xi, are com- 
puted by solving the following optimization problem 

N N 

Y.R^JX,\\l. (26) 

i=i 

The weights Rij represent the linear coefficients for 
reconstructing the sample Xi from its neighbors {xj}, 
whilst the constraint X]j!=i ^ij = 1 means that Xi is 
approximated by a convex combination of its neighbors. 
The weight matrix, R = (Rij), has a closed-form solution 
given by 

(27) 



Algorithm 1: The NPPE Algorithm 



Ri 



argmin N \\xi 



' e^G-ie ' 

where is a column vector formed by the k non-zero 
entries in the i-th row of R and e is a column vector 
of all ones. The {j,l)-th entry of the k x k matrix G is 

{xj —Xi)'^{xi~Xi), where Xj and xi are among the fc-NNs 
of Xi. 

NPPE aims to preserve the reconstruction weights Rij 
from the high-dimensional input data samples to their 
low-dimensional representations imder the polynomial 
mapping assumption. This is achieved by solving the 
following optimization problem 



y 



N N 

argmin 1 1 - Rzj Vj 1 



2 

2 ! 



(28) 



where each satisfies ((T2l l. 

By a simple algebraic calculation, it can be shown that 
is equivalent to (141 and flSl l with 



N 



Wij = Rij + Rj, - ^ RikRkj, and A = 1 



(29) 



k=l 



By the result in Section |31 the explicit nonlinear mapping 
can be obtained by solving ( |23l l and the low-dimensional 
representations y of X can be computed by applying 
(|24] l to X. For a new coming sample Xnew, its low- 
dimensional representation can be simply given by l|25] |. 

We conclude this section by summarizing the NPPE 
algorithm in Algorithm [ij 



Input: Data matrix X, the number k of nearest 
neighbors and the polynomial degree p. 
Output: Vectors of polynomial coefficients Vi, 

i = 1,2, . . . ,m. 
Compute Rij by (|27j. 
Compute W and D by (|29t . 
Generate Xp according to ((TSl l. 
Solve the generalized eigenvalue problem (|23] | to 
get Vi, i = l,2,...,m. 



Algorithm 2: The Simplified NPPE Algorithm 

Input: Data matrix X, the number k of nearest 

neighbors and the polynomial degree p. 
Output: Vectors of polynomial coefficients Vi, 

i = 1,2, ... ,m. 
Compute Rij by (l27t . 
Compute W and D by (|29l . 
Generate Xp according to (|30] |. 
Solve the generalized eigenvalue problem (|23] | to 
get Vi, i ~ 1,2, . . . ,m. 



4.2 Computational Complexity and Simplified NPPE 

In the training procedure of NPPE, the computational 
complexity of generating Xp is 0{N J2^=2''^^)- Comput- 
ing XpWX^ and XpDX^ takes 0(fciV^ Er=i ^nd 
0{N^ X]r=i '^*) operations, respectively, since there are 
only k non-zero entries in each column of W and D is 
a diagonal matrix. The computational complexity of the 
final eigen-decomposition is 0{m{J2^^i "'')^)/ which is 
the most time-consuming step. 

In the procedure of locating new samples with NPPE, 
generating Xp"'^"''' takes 0{J2i=2 operations and com- 
puting ynew takes 0(m(X]Li "-*)^) operations. 

From the above analysis, it can be seen that, as the 
polynomial order p increases, the overall computational 
complexity increases exponentially with p, which would 
be extremely time-consuming when the data dimension 
is very high. To address this issue, we simplify NPPE by 
removing the crosswise items. This is achieved by replac- 
ing the Kronecker product in ((TSt with the Hadamard 
product 

/ ? \ 



p 



Xi Q Xi Q ■ ■ ■ (D Xi 

Xi Xi 



\ 



(30) 



With this strategy, the computational complexity of 
generating Xp is reduced to 0{np{p + l)/2), whilst the 
computational complexity computing ynew is reduced to 
0{mn^p^). The Simplified NPPE (SNPPE) is summarized 
in Algorithm |21 

Finally, the computational complexity of SNPPE, linear 
methods and kernel methods on computing ynew is 
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TABLE 2 

Computational complexity of SNPPE, linear methods and 
kernel methods on computing the low-dimensional 
representation of a new coming sample. 



Methods 


SNPPE 


Linear 


Kernel 


Complexity 




0{mn?) 


0{n'^N^) 



summarized in Table |2l The computational complexity 
of different kernel methods varies. Here we only state 
the computational complexity of the common step of 
computing the inner products. It is obvious that the total 
complexity in computing ynew is not less than this value. 

4.3 Discussion 

In this subsection, we briefly explain why NPPE or 
SNPPE has a better performance than its linear coun- 
terparts for nonlinear ly distributed data sets. 

Let / = (/^,/^,--- ,/™) be a nonlinear map from a 
manifold 7W C K" to K'" such that y.f = f'^ix,), where 
f'' is at least pth-order differentiable. For simplicity, and 
without loss of generality we may assume that <E M 
and that /(O) = 0. Then the Taylor expansion of f'^{x) 
at zero is given by 

fix) = {Vf{0)fx + \x^Hju (0).T + o{\\xf) , (31) 

where V/'' and Hjk are the gradient and Hessian of 
f'', respectively. From lISTI l, it can be seen that the linear 
methods only use the first-order approximation pro- 
vided by V/*'(0) to approximate the nonlinear mapping 
(x), while the proposed polynomial mapping contains 
the extra high-order terms. Therefore, the explicit nonlin- 
ear mapping based on the pol5momial assumption gives 
a better approximation to the true nonlinear mapping / 
than the explicit linear one. 

5 EXPERIIWENTAL TESTS 

In this section, experiments on both synthetic and real 
world data sets are conducted to illustrate the validity 
and effectiveness of the proposed NPPE algorithm. In 
Section |5ll NPPE is tested on recovering geometric 
structures of surfaces embedded in M.^. In Section 15.21 
NPPE is applied to locating new coming data samples in 
the learned low-dimensional space. In Section |53l NPPE 
is used to extract intrinsic degrees of freedom underlying 
two image manifolds. In the experiments, the simplified 
version of NPPE is implemented and compared with 
NPP 1221 and ONPP (which apply the linear and 
orthogonal linear projection mapping to the training 
procedure for LLE, respectively) as well as the kernel 
extrapolation (KE) method proposed in Il33ll . 

There are two parameters in the NPPE algorithm, 
the number k of nearest neighbors and the polynomial 



degree p. k is usually set to be 1% of the number of 
training samples, and the experimental tests show that 
NPPE is stable around this number. The choice of p 
depends on the dimension m. When is small, p can be 
large to make NPPE more accurate. When rn is large, p 
should be small to make NPPE computationally efficient. 
Experiments show that NPPE with p = 2 is already 
accurate enough. 

5.1 Learning Surfaces in with NPPE 

In the first experiment, NPPE, NPP, ONPP and 
LLE are applied to the task of unfolding surfaces 
embedded in R"^. The surfaces are the Swiss Roll, 
SwissHole, and Gaussian, all of which are 
generated by the Matlab Demo available at 
http://www.math.umn.edu/~wittman/mani/ On each 
manifold, 1000 data samples are randomly generated for 
training. The number of nearest neighbors is fc = 10 and 
the polynomial degree p = 2. The experimental results 
are shown in Fig.[l] In each sub-figure, Z = [zi Z2 ■ ■ ■ zn] 
stands for the generating data such that Xi = 4){zi), 
where is the nonlinear mapping that embeds Z in R'^. 
It can be seen from Fig. [T] that NPPE outperforms all 
the other three methods, even the LLE method itself. 
NPP and ONPP fail to unfold these nonlinear manifolds 
(except for ONPP on Gaussian). 

Furthermore, in order to estimate the similarity be- 
tween the learned low-dimensional representations and 
the generating data, the residual variance H) p(Y, Z) = 
1 — [Y, Z) is computed, where R is the standard linear 
correlation coefficient taken over all entries of Y and Z. 
The lower p{Y, Z) is, the more similar Y and Z are. The 
estimation results are shown in Fig. [Hd). It can be seen 
that the embedding given by NPPE is the most similar 
one. 

5.2 Locating New Data Samples with NPPE 

In the second experiment, we apply NPPE, NPP, ONPP 
and KE to locating new coming samples in the learned 
low-dimensional space. First, 2000 data samples which 
evenly distribute on the SwissRoll manifold are gen- 
erated. Then 1000 samples are randomly selected as the 
training data to learn the mapping relationship from 
R3 to R2 by NPPE, NPP, ONPP and KE. The learned 
mappings are used to provide the low-dimensional rep- 
resentations for the rest 1000 samples. The time cost of 
computing the low-dimensional representations of the 
testing samples is also recorded. Experimental results are 
shown in Fig. |2l It can be seen that NPPE not only gives 
the best locating result but also has much lower time 
cost than KE. NPP and ONPP are faster for computation 
but fail to give the correct embedding result. The same 
experiment is also conducted on data samples randomly 
selected from SwissRoll. The results are shown in Fig. 
|3l NPPE also outperforms the other three methods. 

To further validate the performance of NPPE, we 
randomly generate 11000 samples on the SwissRoll 
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manifold, 1000 for training and 10000 for testing. The 
experimental procedure is just the same as the preceding 
one. Time cost versus number of testing samples is 
shown in Fig. Sta). The residual variances between the 
generating data of the testing samples and their low- 
dimensional representations given by the four methods, 
are illustrated in Fig.|4|b). The experimental results show 
that NPPE is more accurate than all the other three 
methods with a similar computational cost with NPP 
and ONPP. Note that, in all the above experiments, the 
time cost of KE is increasing linearly with the number 
of testing samples increasing, whilst that of NPP, ONPP 
and NPPE is almost the same with the increase of the 
number of testing samples. 



5.3 Learning Image Manifolds with NPPE 

In the last experiment, NPPE is applied to extract intrin- 
sic degrees of freedom imderlying two image manifolds, 
the lleface ||2l and usps-0. 

The lleface consists of 1965 face images of the same 
person at resolution 28 x 20, and the two intrinsic degrees 
of freedom underlying the face images are rotation of 
the head and facial emotion. We randomly select 1500 
samples as the training data and 400 samples as the 
testing data. The number of nearest neighbors is set to 
be 15. The experimental results are shown in Fig. |5l 
The training and testing results are shown on the left 
and right columns, respectively, in Fig. |5l 100 training 
samples and 40 testing samples are randomly selected 
and attached to the learned embedding. It can be seen 
that NPPE and NPP have successfully recovered the 
underlying structure of lleface, while the result given 
by KE is not satisfactory. The rotation degree is not 
extracted by the learned embedding with KE. Time cost 
on locating new data samples by these three methods is 
shown in Fig. [Zla). The time cost of NPPE is higher than 
that of NPP but lower than that of KE, which supports 
the analysis of computational complexity in Section 14.21 

The usps-0 data set consists of 765 images of hand- 
written digit '0' at resolution 16 x 16, and the two 
underlying intrinsic degrees of freedom are the line 
width and the shape of '0'. 600 samples are randomly 
selected as training data and 150 samples are chosen 
to be testing data. The number of nearest neighbors is 
set to be 5. Fig. |6] illustrates the experimental results. 
Training and testing results are shown on the left and 
right columns, respectively. 100 training samples and 
20 testing samples are randomly selected and shown in 
the learned embedding. It can be seen that NPPE has 
successfully recovered the underlying structure, while 
it is hard to see the changes of line width and shape 
in the embedding given by KE and ONPP. Time cost 
on locating new data samples by these three methods is 
shown in Fig.[7tb). The time cost of NPPE is higher than 
ONPP but much lower than KE. 



6 Conclusion 

In this paper, an explicit nonlinear mapping for man- 
ifold learning is proposed for the first time. Based 
on the assumption that there is a polynomial map- 
ping from the high-dimensional input samples to their 
low-dimensional representations, an explicit polynomial 
mapping is obtained by applying this assumption to a 
generic model of manifold learning. Furthermore, the 
NPPE algorithm is a nonlinear dimensionality reduction 
technique with a explicit nonlinear mapping, which 
tends to preserve not only the locality but also the non- 
linear geometry of the high-dimensional data samples. 
NPPE can provide convincing embedding results and 
locate new coming data samples in the reduced low- 
dimensional space simply and quickly at the same time. 
Experimental tests on both synthetic and real-world data 
have validated the effectiveness of the proposed NPPE 
algorithm. 
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Fig. 2. Experiment on locating new samples for uniformly distributed SwissRoll data, (a) Training data and their 
generating data, (b) Time cost versus number of testing samples, (c) Locating results by NPPE. (d) Locating results 
by NPP. (e) Locating results by ONPP. (f) Locating results by KE. In (c)-(f), iV = stands for the training result. 
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Fig. 3. Experiment on locating new samples for randomly distributed SwissRoll data, (a) Training data and their 
generating data, (b) Time cost versus number of testing samples, (c) Locating results by NPPE. (d) Locating results 
by NPP. (e) Locating results by ONPP. (f) Locating results by KE. In (c)-(f), iV = stands for the training result. 




Fig. 4. Experiment on locating new samples for 10000 randomly distributed SwissRoii data, (a) Time cost versus 
number of testing samples, (b) Residual variance versus number of testing samples. 
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Fig. 5. Experiment on lief ace data. Training results are plotted by blue dots while testing results are marked with 
filled red circles, (a) (b) Learning and testing results by NPPE. (c) (d) Learning and testing results by NPP. (e) (f) 
Learning and testing results by KE. 
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Fig. 6. Experiment on usps data. Training results are plotted by blue dots while testing results are marked with filled 
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Fig. 7. Time cost of experiments on image manifold data, (a) Time cost versus number of testing samples on lie face, 
(b) Time cost versus number of testing samples on usps. 



